The data set Checkouts-SPL-Data from 2017 to 2023 with at least 10 times checkouts is from the Seattle Public Library. I choose this data set because I want to explore people reading preference, such as what books are selected most or published in which year, among popular books that are filtered by the at least of 10 times of checkouts. And then, I want to manipulate the data to analyze their habits of reading, such as how long spent on one book, what do they use to read. Since reading is good habit to have in life, if more people can enjoy reading, it will be a good way to spread different culture, ideas, and knowledge through books. In order to make the communication of words more effective, understanding people’s preferences and habits of reading provides a foundation for technological company and publishing house to better target consumers.
There are six question I want to explore in this project.
What is the name of the book that was read the most times in a month?
The most popular book in a month is named So You Want to Talk about Race (Unabridged).
Which book was read the most times in a month?
It has been checked out by 4903 times within that month.
In what year do people most like to read books published?
People like to read books that were published in 2017.
How long is the average borrowing/reading period for people?
This question cannot be answer by given data set since we do not have data recording how long did each book being hold and how fast did it change hand.
What is the difference between the number of people who like to read paper books and e-books?
From 2017 to 2023, 104706 more people prefer reading paper book than ebook.
Which checkout type is the most popular among these years?
The most popular check out type is Horizon.
By exploring the questions above, we can have a basic understanding about people’s reading preference and thus save time once using the data set. Since we know there is no information about how long does one holding the book, we wouldn’t plot the data for this question. Besides, We realize that the most popular vendor is Horizon, we may expect using a pie chart to visulaize the proportion of each vendor, and checking whether Horizon takes the largest part. People also like the book published in 2017, we can do some reseach and see why does the books published in the year is very popular, and gain more insight about that. And also, other values explored above can be use to better utilize the data and create better visulizations.
The data is collected by Seattle Public Library. It includes a monthly count of checkouts by title for physical and electronic items. The dataset starting info is from in April 2005, but we use part of the data set which includes the data from 2017 to 2023. Within the data set, we have variables of “UsageClass”, denoting if item is physical or digital; “CheckoutType”, indicating the vendor tool; “CheckoutYear”, year that books were checked out; “Title”, the full name of the book, etc.
Checkout data is from different current and historical sources. For example, digital items, the media vendors: Overdrive, hoopla, Freegal, and RBDigital provide usage data. The data was collected for the purpose of increasing the quality of life for our residents by providing more transparency, accountability and comparability information, which will also give foundation for economic development,research utilization, and internal management.
We should consider whether the data include personal information that entails privacy. For example, Readers’ real borrowing names and preferences should not be disclosed to the public. That’s exactly what the SPL is doing, hiding the names of its readers to protect private privacy at the same time provide data to the public.
However, this data set does not include the data about how long did people holding the book. It is hard to understand the time interval that the same book change hand.